259 research outputs found
When to look at a noisy Markov chain in sequential decision making if measurements are costly?
A decision maker records measurements of a finite-state Markov chain
corrupted by noise. The goal is to decide when the Markov chain hits a specific
target state. The decision maker can choose from a finite set of sampling
intervals to pick the next time to look at the Markov chain. The aim is to
optimize an objective comprising of false alarm, delay cost and cumulative
measurement sampling cost. Taking more frequent measurements yields accurate
estimates but incurs a higher measurement cost. Making an erroneous decision
too soon incurs a false alarm penalty. Waiting too long to declare the target
state incurs a delay penalty. What is the optimal sequential strategy for the
decision maker? The paper shows that under reasonable conditions, the optimal
strategy has the following intuitive structure: when the Bayesian estimate
(posterior distribution) of the Markov chain is away from the target state,
look less frequently; while if the posterior is close to the target state, look
more frequently. Bounds are derived for the optimal strategy. Also the
achievable optimal cost of the sequential detector as a function of transition
dynamics and observation distribution is analyzed. The sensitivity of the
optimal achievable cost to parameter variations is bounded in terms of the
Kullback divergence. To prove the results in this paper, novel stochastic
dominance results on the Bayesian filtering recursion are derived. The
formulation in this paper generalizes quickest time change detection to
consider optimal sampling and also yields useful results in sensor scheduling
(active sensing)
Reinforcement Learning: Stochastic Approximation Algorithms for Markov Decision Processes
This article presents a short and concise description of stochastic
approximation algorithms in reinforcement learning of Markov decision
processes. The algorithms can also be used as a suboptimal method for partially
observed Markov decision processes
Quickest Detection with Social Learning: Interaction of local and global decision makers
We consider how local and global decision policies interact in stopping time
problems such as quickest time change detection. Individual agents make myopic
local decisions via social learning, that is, each agent records a private
observation of a noisy underlying state process, selfishly optimizes its local
utility and then broadcasts its local decision. Given these local decisions,
how can a global decision maker achieve quickest time change detection when the
underlying state changes according to a phase-type distribution? The paper
presents four results. First, using Blackwell dominance of measures, it is
shown that the optimal cost incurred in social learning based quickest
detection is always larger than that of classical quickest detection. Second,
it is shown that in general the optimal decision policy for social learning
based quickest detection is characterized by multiple thresholds within the
space of Bayesian distributions. Third, using lattice programming and
stochastic dominance, sufficient conditions are given for the optimal decision
policy to consist of a single linear hyperplane, or, more generally, a
threshold curve. Estimation of the optimal linear approximation to this
threshold curve is formulated as a simulation-based stochastic optimization
problem. Finally, the paper shows that in multi-agent sensor management with
quickest detection, where each agent views the world according to its prior,
the optimal policy has a similar structure to social learning
Partially Observed Markov Decision Processes. Problem Sets and Internet Supplement
This document is an internet supplement to my book "Partially Observed Markov
Decision Processes - From Filtering to Controlled Sensing" published by
Cambridge University Press in 2016. This internet supplement contains
exercises, examples and case studies. The material appears in this internet
supplement (instead of the book) so that it can be updated. This document will
evolve over time and further discussion and examples will be added. This
internet supplement document is work in progress and will be updated
periodically. I welcome constructive comments from readers of the book and this
internet supplement
Controlled Sequential Information Fusion with Social Sensors
A sequence of social sensors estimate an unknown parameter (modeled as a
state of nature) by performing Bayesian Social Learning, and myopically
optimize individual reward functions. The decisions of the social sensors
contain quantized information about the underlying state. How should a fusion
center dynamically incentivize the social sensors for acquiring information
about the underlying state? This paper presents five results. First, sufficient
conditions on the model parameters are provided under which the optimal policy
for the fusion center has a threshold structure. The optimal policy is
determined in closed form, and is such that it switches between two exactly
specified incentive policies at the threshold. Second, it is shown that the
optimal incentive sequence is a sub-martingale, i.e, the optimal incentives
increase on average over time. Third, it is shown that it is possible for the
fusion center to learn the true state asymptotically by employing a sub-optimal
policy; in other words, controlled information fusion with social sensors can
be consistent. Fourth, uniform bounds on the average additional cost incurred
by the fusion center for employing a sub-optimal policy are provided. This
characterizes the trade-off between the cost of information acquisition and
consistency for the fusion center. Finally, when it is sufficient to estimate
the state with a degree of confidence, uniform bounds on the budget saved by
employing policies that guarantee state estimation in finite time are provided
Average-Consensus Algorithms in a Deterministic Framework
We consider the average-consensus problem in a multi-node network of finite
size. Communication between nodes is modeled by a sequence of directed signals
with arbitrary communication delays. Four distributed algorithms that achieve
average-consensus are proposed. Necessary and sufficient communication
conditions are given for each algorithm to achieve average-consensus. Resource
costs for each algorithm are derived based on the number of scalar values that
are required for communication and storage at each node. Numerical examples are
provided to illustrate the empirical convergence rate of the four algorithms in
comparison with a well-known "gossip" algorithm as well as a randomized
information spreading algorithm when assuming a fully connected random graph
with instantaneous communication.Comment: 53 pages, 2 figures, 1 table. Short version submitted to IEEE Trans.
Signal Processin
Dependence Structure Analysis Of Meta-level Metrics in YouTube Videos: A Vine Copula Approach
This paper uses vine copula to analyze the multivariate statistical
dependence in a massive YouTube dataset consisting of 6 million videos over 25
thousand channels. Specifically we study the statistical dependency of 7
YouTube meta-level metrics: view count, number of likes, number of comments,
length of video title, number of subscribers, click rates, and average
percentage watching. Dependency parameters such as the Kendall's tau and tail
dependence coefficients are computed to evaluate the pair-wise dependence of
these meta-level metrics. The vine copula model yields several interesting
dependency structures. We show that view count and number of likes' are in the
central position of the dependence structure. Conditioned on these two metrics,
the other five meta-level metrics are virtually independent of each other.
Also, Sports, Gaming, Fashion, Comedy videos have similar dependence structure
to each other, while the News category exhibits a strong tail dependence. We
also study Granger causality effects and upload dynamics and their impact on
view count. Our findings provide a useful understanding of user engagement in
YouTube
How to Calibrate your Adversary's Capabilities? Inverse Filtering for Counter-Autonomous Systems
We consider an adversarial Bayesian signal processing problem involving "us"
and an "adversary". The adversary observes our state in noise; updates its
posterior distribution of the state and then chooses an action based on this
posterior. Given knowledge of "our" state and sequence of adversary's actions
observed in noise, we consider three problems: (i) How can the adversary's
posterior distribution be estimated? Estimating the posterior is an inverse
filtering problem involving a random measure - we formulate and solve several
versions of this problem in a Bayesian setting. (ii) How can the adversary's
observation likelihood be estimated? This tells us how accurate the adversary's
sensors are. We compute the maximum likelihood estimator for the adversary's
observation likelihood given our measurements of the adversary's actions where
the adversary's actions are in response to estimating our state. (iii) How can
the state be chosen by us to minimize the covariance of the estimate of the
adversary's observation likelihood? "Our" state can be viewed as a probe signal
which causes the adversary to act; so choosing the optimal state sequence is an
input design problem. The above questions are motivated by the design of
counter-autonomous systems: given measurements of the actions of a
sophisticated autonomous adversary, how can our counter-autonomous system
estimate the underlying belief of the adversary, predict future actions and
therefore guard against these actions
Adaptive Polling in Hierarchical Social Networks using Blackwell Dominance
Consider a population of individuals that observe an underlying state of
nature that evolves over time. The population is classified into different
levels depending on the hierarchical influence that dictates how the
individuals at each level form an opinion on the state. The population is
sampled sequentially by a pollster and the nodes (or individuals) respond to
the questions asked by the pollster. This paper considers the following
problem: How should the pollster poll the hierarchical social network to
estimate the state while minimizing the polling cost (measurement cost and
uncertainty in the Bayesian state estimate)? This paper proposes adaptive
versions of the following polling methods: Intent Polling, Expectation Polling,
and the recently proposed Neighbourhood Expectation Polling to account for the
time varying state of nature and the hierarchical influence in social networks.
The adaptive polling problem in a hierarchical social network is formulated as
a partially observed Markov decision process (POMDP). Our main results exploit
the structure of the polling problem, and determine novel conditions for
Blackwell dominance to construct myopic policies that provably upper bound the
optimal policy of the adaptive polling POMDP. The LeCam deficiency is used to
determine approximate Blackwell dominance for general polling problems. These
Blackwell dominance conditions also facilitate the comparison of Renyi
Divergence and Shannon capacity of more general channel structures that arise
in hierarchical social networks. Numerical examples are provided to illustrate
the adaptive polling policies with parameters estimated from YouTube data
Sequential Detection of Market shocks using Risk-averse Agent Based Models
This paper considers a statistical signal processing problem involving agent
based models of financial markets which at a micro-level are driven by socially
aware and risk- averse trading agents. These agents trade (buy or sell) stocks
by exploiting information about the decisions of previous agents (social
learning) via an order book in addition to a private (noisy) signal they
receive on the value of the stock. We are interested in the following: (1)
Modelling the dynamics of these risk averse agents, (2) Sequential detection of
a market shock based on the behaviour of these agents. Structural results which
characterize social learning under a risk measure, CVaR (Conditional
Value-at-risk), are presented and formulation of the Bayesian change point
detection problem is provided. The structural results exhibit two interesting
prop- erties: (i) Risk averse agents herd more often than risk neutral agents
(ii) The stopping set in the sequential detection problem is non-convex. The
framework is validated on data from the Yahoo! Tech Buzz game dataset
- …